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Support Data Modeling in IS 2002 

Leslie J. Waguespack, Jr., Ph.D. 
LWaquespack@Bentlev.edu 
Computer information Systems Department 
Bentley University 

Waltham, Massachusetts 02154-4705 USA 


Abstract 

No individual subject area in IS 2002 impacts more aspects of computing theory or profes¬ 
sional preparation than data modeling. For more than four decades the bedrock of data model¬ 
ing has been the relational data model. There are numerous extensions, variations and imple¬ 
mentations of this theory but its core remains the central anchor in the practice of data-driven 
analysis and design. Like most theoretical foundations that have spawned application devel¬ 
opment tools and methodologies much of the pure theory of the relational model is obscured 
by necessary choices of syntax and implementation features that in many cases complicate if 
not defy a student's grasp of the theory. This is compounded by the progression from one tool 
or syntax to another as students traverse their computing curricula. This is a distillation of the 
relational data model compact enough to be easily committed to memory and robust enough 
to serve as the consistent reference to the relational paradigm spanning IS 2002.P0 through 
IS 2002.7 and IS 2002.8 for computing majors, minors and general education. In a format 
reminiscent of the IBM System/360 Principles of Operation Pocket Reference (the "Green 
Card"), this distillation fits nicely on two sides of a single sheet of 8.5" x 11" paper, hence a 
"Relational Green Card." 

Keywords: relational data model, relational paradigm, data modeling, data-driven modeling, 
relational model quick reference, data modeling pedagogy 


1. INTRODUCTION 

No individual subject area in IS 2002 im¬ 
pacts more aspects of computing theory or 
professional preparation than data modeling. 
Relational database is among the first dozen 
learning units designated as prerequisite to 
the IS 2002 curriculum in IS 2002.P0 (Gor- 
gone et. al 2002). Five of the model courses 
in IS 2002 explicitly identify database in the 
learning units including the first preprogram 
requirement of IS 2002. IS 2002.P0 is also 
frequently used as the model for computing 
in general education across all college curri¬ 
cula. Table 1 following lists all the model 
courses designated in IS 2002: number, title 
and prerequisites. In the last column those 
courses indicating required learning units in 
database are annotated with the number of 
learning units explicitly requiring database 
learning compared with the total number of 


required learning units designated in their 
descriptions. 

For more than four decades the bedrock of 
data modeling that underpins database pe¬ 
dagogy has been the relational data model 
(Codd 1969, 1970). There are numerous 
extensions, variations and implementations 
of this theory but its core remains the cen¬ 
tral anchor in the practice of data-driven 
analysis and design (Chen 1976, Fagin 1981, 
Zaniolo 1982, Date 2004). Like most theo¬ 
retical foundations that have spawned appli¬ 
cation development tools and methodologies 
much of the pure theory of the relational 
model is obscured by the necessities of syn¬ 
tax and implementation features that in 
many cases complicate if not defy a stu¬ 
dent's grasp of the theory. This is com¬ 
pounded by the progression from one tool or 
syntax to another as students traverse their 
computing curricula. 
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This paper presents a distillation of the rela¬ 
tional data model upon which is based an 
undergraduate and graduate data modeling 
pedagogy. It is compact enough to be easily 
committed to memory and robust enough to 
serve as the consistent reference to the rela¬ 
tional paradigm spanning IS 2002.PO 
through IS 2002.7 and IS 2002.8. In a for¬ 
mat reminiscent of the IBM System/360 
Fundamentals of Operation Pocket Reference 
(the "Green Card"), this distillation fits nicely 
on two sides of a single sheet of 8.5" x 11" 
paper, hence a "Relational Green Card." 


c 

# 

COURSE TITLE 

PR 

LU 

PO 

Personal Productivity With IS 
Technology 


3/2 

1 

1 

Fundamentals of Information 
Systems 

PO 


2 

Electronic Business Strategy, 
Architecture and Desiqn 

1 


3 

Information Systems Theory 
and Practice 

1 


4 

Information Technology 
Hardware and System Soft¬ 
ware 

1 


5 

Programming, Data, File and 
Object Structures 

1 

2/2 

0 

6 

Networks and Telecommuni¬ 
cations 

4 


7 

Analysis & Logical Design 

1 

1/1 

4 

8 

Physical Design and Imple¬ 
mentation With DBMS 

5, 

7 

7/1 

6 

9 

Physical Design and Imple¬ 
mentation in Emerging Envi¬ 
ronments 

2, 

8 


10 

Project Management and 
Practice 

7 

1/1 

1 


Table 1 

Database Content in IS 2002 Courses 


As a special case programming language 
design is further complicated by the need for 
feasibility of automated translation and inte¬ 
roperability with other programming lan¬ 
guages and operating systems. Designers 
must consider upward, downward, and 
cross-compatibility within versions of a pro¬ 
gramming language. Compromises and as¬ 
sumptions are chosen to make the resulting 
language efficient, effective and marketable 
but not to clarify the underlying theory! 

The goal of this description of the relational 
paradigm is to strip away the extraneous 
facets that programming language or tool 
design must use to achieve their "practical" 
product requirements; and in so doing to 
succinctly make the underlying relational 
data model concepts evident and unders¬ 
tandable. This approach follows the success 
of an analogous effort to present the core 
concepts of the object-oriented paradigm 
(Waguespack 2009). It provides a know¬ 
ledge-base that both teacher and student 
can carry from one data modeling tool or 
application to another exposing how they 
treat relational paradigm concepts alike 
and/or how they treat them differently in 
practice. 


3. ONTOLOGY OF THE 
RELATIONAL PARADIGM 


Relational 

functional relation 

dependency 

property / 

domain 

data attribute 

entity integrity 

»•** property 

referential integrity ... 

PWey . value' 

\ data attribute 

'■... atomicity remembrance 

'■-...property propeny 

membership IN association 

property relationship 


Concepts 

instance 

relationship 



membership OF 

property 


tuple 


2. THE RELATIONAL PARADIGM 
WITHOUT LANGUAGE OR SYNTAX 

Every language that is invented to express 
concepts carries with it the understanding 
and the biases of the inventor. Depending 
on his/her purpose(s) those biases simplify 
certain tasks performed with the language 
but, may obscure underlying concepts. 


Figure 1 - Relational Concept Map 

Computer science and information science 
categorize a domain of concepts as 
1) individuals, 2) attributes, 3) relationships 
and 4) classes. Following that discipline this 
ontology of the relational paradigm attempts 
to eschew the vestiges of implementation 
languages and development methodologies 
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in order to expose the core nature and value 
of the relational concepts. The relational 
ontology is arranged as follows (and is de¬ 
picted graphically in the map in Figure 1 
while an illustration of a two-page rendering 
of the "Green Card" is found in appendix A): 

Individuals 

- Tuple 
Attributes 

- Data Attributes 
Classes 

- Relation 
Relationships 

- Behavioral Relationships 

-- Functional Dependency 
— Entity Integrity 
-- Association 

--- Relational Operations 
— Join Compatibility 
--- Referential Integrity 
-- Normalization 

--- First Normal Form 
--- Second Normal Form 
--- Third Normal Form 
Table 2 

Ontology of the Relational Paradigm 

Individuals - The most concrete concept in 
the relational paradigm is the tuple. 

Tuple - A tuple corresponds 1-1 with a sin¬ 
gle concept of reality that it represents. A 
tuple collects the facts that identify it as a 
single concept and the facts most closely 
identified with it. 

Attributes - Attributes are those characte¬ 
ristics (facts) that describe a tuple. In the 
relational paradigm attributes define data 
characteristics - each of which has a static 
and dynamic form. A prescribed set of 
attributes defines what is called the struc¬ 
ture of a tuple. From inception to extinc¬ 
tion the structure of a tuple is immutable. 
The number of attributes in a tuple is 
called its degree. 

Data Attributes - Data attributes store in¬ 
formation (data) in the tuple and imple¬ 
ment the property of remembrance. Re¬ 


membrance is manifest in each attribute 
dynamically as "what is remembered," a 
particular data attribute value particular to 
each tuple derived from a data attribute 
domain that statically defines "what can be 
remembered," the possible values of the 
attribute. 

Classes - The relational paradigm groups 
individuals into a collection called a rela¬ 
tion. The relation corresponds directly with 
its mathematical antecedent where 
attribute values within each tuple reflect a 
correspondence with the coincidence of 
facts in the "real world," a correspondence 
(attribute relationship) that is shared by 
every tuple in that relation. 

Relation - The relation concept combines 
both a definition of structure and the col¬ 
lection of tuple(s) based on that structure. 
A relation is defined as a fixed set of data 
attribute domains. Every tuple is an in¬ 
stance of a specific relation and shares the 
same static structure defined by that rela¬ 
tion with every other tuple of that relation. 
The relation concept thereby fuses the ex¬ 
istence of the tuples to that of their rela¬ 
tion; tuples cannot exist independent of 
their defining relation. Tuples are said to 
be members of their relation. Tuples are 
added to or deleted from their relation. The 
order of attributes in a relation is insignifi¬ 
cant except that the order is consistent for 
all tuples. A relation is also commonly 
called a table and each of its tuples or in¬ 
stances, a row. The collection of data 
attribute value(s) for a particular data 
attribute from every row in a table is called 
a column. 

Relationships - Relationships in the rela¬ 
tional paradigm are based on the property 
of remembrance and the juxtaposition of 
data attribute values in one or more tuples 
in the same or across relations. 

Behavioral Relationships - The behavioral 
relationships are all based upon the data 
attribute value(s) and which values are 
permitted to coexist in and across tuples 
and relations. 

Functional Dependency - In a relation a 
data attribute is functionally dependent 
when its data attribute value is always the 
same in any tuple for a given value in a 
second data attribute. In other words, the 
value of the first data attribute is deter- 
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mined by the value of the second; the 
second attribute is sometimes called the 
determinant. Functional dependency ex¬ 
presses the informational integrity of rela¬ 
tions. 

Entity Integrity - Entity integrity defines 
the two-fold quality of tuple uniqueness in 
a relation: a) every tuple in a relation is 
distinct in some data attribute value(s) 
from every other tuple in that relation or 
symmetrically, b) there is a designated 
subset of data attributes (column(s)) called 
the primary key such that the data 
attribute value(s) of those data attribute(s) 
in that relation is distinct for all tuples and 
no values among them may be null (a val¬ 
ue which is unknown and incomparable to 
any other value). There may be more than 
one subset of data attributes with the val¬ 
ue characteristics of the primary key (each 
called a candidate key) but only one is des¬ 
ignated as the primary key. 

Association - An association is a relation¬ 
ship between tuples in the same or differ¬ 
ent relations. Tuples are intrinsically se¬ 
parable by way of entity integrity. At the 
same time, humans are compelled to cate¬ 
gorize their experience of things in the 
physical world by superimposing groupings 
that collect tuples into sets. Tuples become 
members in a group based upon data 
attribute value(s). This property is called 
membership IN. This property also per¬ 
mits humans to identify a tuple that is not 
in a set (i.e. discrimination). (Membership 
IN an association is distinct from member¬ 
ship OF a relation that is intrinsic by way of 
instance relationship.) 

Relational Operations - Membership IN is 
realized through relational operations key¬ 
ing on relation structure and values. Each 
relational operation produces a real or vir¬ 
tual relation as its result. The selection op¬ 
eration retrieves tuple(s) based upon a se¬ 
lection predicate testing data attribute val- 
ue(s) to determine whether each tuple is 
or is not in the set. Selection predicates 
are based on any boolean comparison in¬ 
cluding constant values or values refe¬ 
renced in data attribute value(s). The pro¬ 
jection operation copies all the data 
attribute value(s) for a particular col¬ 
umn^). Association between relations (or 
a relation and itself) is based upon relating 
(matching) data attribute values in tuples 


of one relation with those of another. The 
join operation pairs every combination of 
tuples from one relation with those of 
another relation and copies the data 
attribute values from the pairs where the 
pairing satisfies a selection predicate. This 
relational operation is called join because 
facts from two sources are joined in the re¬ 
sult. 

Join Compatibility - Join compatibility re¬ 
quires that the values involved in compari¬ 
sons (i.e. selection predicates) whether 
constants or data attribute values derive 
from the same data attribute domain. 

Referential Integrity - When relations are 
devised such that a tuple in one relation 
predisposes the existence of (owns) 
tuple(s) in another, the data attribute(s) of 
the second required to join the relations is 
called a foreign key. Referential integrity 
asserts that any value found in the data 
attribute(s) of a foreign key must appear in 
a tuple of the first relation as the value of a 
candidate key or itself be null. 

Normalization - Relational model consis¬ 
tency depends on the semantic concur¬ 
rence of the behavioral relationships and 
the objectives of the database modeler, 
the intension, (rather than the accident of 
a relation's contents at any particular in¬ 
stant, its extension). The integrity proper¬ 
ties defined above enable the database 
modeler to devise a structure and behavior 
of relations that avoid semantic discord 
called anomalies, the unintended loss or 
modification of information by relational 
operations. Relations designed to avoid 
certain kinds of anomalies are said to be 
normalized or in normal form. Normaliza¬ 
tion is the arrangement of data attributes 
and their relationships among relation 
structures to prevent particular anomalies. 

First Normal Form - First Normal Form 
asserts that every data attribute value is 
atomic, indivisible in value or form and 
may not be operated upon except as a 
whole and single value. 

Second Normal Form - Second Normal 
Form is first normal form and asserts that 
every data attribute value not in the pri¬ 
mary key is fully functionally dependent 
upon the primary key. ("Fully" means ap¬ 
plying to every data attribute of the prima¬ 
ry key.) 
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Third Normal Form - Third Normal Form 
presupposes first and second normal forms 
and asserts that no data attribute outside 
the primary key is transitively dependent 
upon the primary key. ("Transitively" 
means an attribute(s) functionally depen¬ 
dent upon an attribute functionally depen¬ 
dent upon an attribute [. . .] functionally 
dependent on the primary key.) 

4. DISCUSSION 

As with any formal theory this relational on¬ 
tology as a distillation forms the basis not a 
complete pedagogy for teaching/learning 
data-driven modeling and analysis. Pedagog¬ 
ical completeness is not the intention. An 
effective pedagogy built upon this ontology 
also depends upon the academic maturity of 
the audience, the curricular context of the 
coursework and the expository style of the 
teacher. 

Although the ontology is succinct it has 
proven effective as a teaching vehicle for 
presenting concepts at varying levels of de¬ 
tail. It is elemental and defines the essential 
vocabulary to frame any discussion of data 
modeling. Issues of data integrity can be 
addressed at either the operational level as 
with the relational operations and their me¬ 
chanics or at a conceptual level as with the 
intention of the data modeler in their repre¬ 
sentation of "reality" through normalization. 
The richness of the paradigm is thereby pre¬ 
served and crystallized. 

The relational ontology provides both a 
framework for organizing pedagogy and a 
discipline for choosing pedagogical instru¬ 
ments that consistently ground the teacher 
and student to the theoretical roots regard¬ 
less of which application domain or problem 
solving tool is chosen. 

5. SUMMARY 

This is a very short presentation of a suc¬ 
cinct, compact description of the relational 
paradigm without the embellishments or 
compromises often necessary to support 
computer-based translation (as in a query 
language such as SQL or QBE) or a graphi¬ 
cally augmented representation such as 
Entity-Relationship diagrams. The ontology 
derives from the very earliest of conceptions 
of the relational paradigm at a time before 
there was competition for commercial- 


dominance, language or methodology stan¬ 
dardization. 

The primary value of this approach in ex¬ 
plaining the relational paradigm is two-fold. 
First, absent the accidents of implementation 
that accompany all programming languages, 
both the student and teacher of the relation¬ 
al data model have a basis for discriminating 
between those features that are essential to 
the paradigm and those that are accidental 
to an implementation of it (Brooks 1987). 
Second, it also facilitates assessing the rela¬ 
tional data model's role in more advanced 
applications of the paradigm (e.g. query lan¬ 
guages, embedded data languages and ap¬ 
plication programming interfaces). 

Data modeling can be likened to a religion 
with its saints, zealots and heretics. For that 
reason and the fact that at its core it is a 
framework or pattern for creating abstrac¬ 
tions, conceptions in the human mind, it 
may not be possible to find a uniquely per¬ 
fect, universally accessible depiction of the 
paradigm itself. As with all models, this ex¬ 
planatory model for the relational paradigm 
cannot be judged as perfect, but perhaps it 
may be judged as useful. 
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Green Card Illustration 

The Relational Green Card may be effectively reproduced as the front and back of a single 8.5" 
x 11" sheet of paper. Terms used with special meaning are italicized. Those initially defined 
are also bolded. 
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The Relational Paradigm 

Without a Language or Syntax! 

What is the relational world all aboutf 

The Relational Ontology 

This ontology is consistent with the practice in computer science and information science categorizing a domain of 
concepts (i.e. individuals, attributes, classes and relationships). This ontology of the relational paradigm of data modeling 
minimizes the vestiges of implementation languages and methodologies to expose the core nature of relational concepts. 

1. Individuals 

The most concrete concept in the relational paradigm is the tuple. 

1.1. Tuple 

A tuple corresponds 1-1 with a single concept of reality that it represents. A tuple collects the facts that identify it as a 
single concept and the facts most closely identified with it. 

2. Attributes 


Attributes are those characteristics (facts) that describe a tuple. In the relational paradigm attributes define data 
characteristics - each of which has a static and dynamic form. A prescribed set of attributes defines what is called the 
structure of a tuple. From inception to extinction the structure of a tuple is immutable. The number of attributes in a tuple is 
called its degree. 

2.1. Data Attribute 

Data attributes store information (data) in the tuple and implement the property of remembrance. Remembrance is 
manifest in each attribute dynamically as “what is. remembered,” a particular data attribute value for each tuple derived from 
a data attribute domain that statically defines “what can be remembered,” the possible values of the attribute. 

3. Classes 

The relational paradigm groups individuals into a collection called a relation. The relation corresponds directly with its 
mathematical antecedent where attribute values within each tuple reflect a correspondence with the coincidence of facts in 
the “real world,” a correspondence ( attribute relationship) that is shared by every tuple in that relation. 

3.1. Relation 


The relation concept combines both a definition of structure and the collection of tuple(s) based on that structure. A 
relation is defined as a fixed set of data attributes. Every tuple is an instance of a specific relation and shares the same static 
structure defined by that relation with every other tuple of that relation. The relation concept thereby fuses the existence of 
the tuples to that of their relation ; tuples cannot exist independent of their defining relation. Tuples are said to be members of 
their relation. Tuples are added to or deleted from their relation. The order of attributes in a relation is insignificant except 
that the order is consistent for all tuples. A relation is also commonly called a table and each of its instances, a row. The 
collection of every data attribute value(s) for a particular data attribute in a table is called a column. 


4. Relationships 


Relationships in the relational paradigm are based on the property of 
remembrance and the juxtaposition of data attribute values in one or more 
tuples in the same or across relations. 

4.1. Behavioral Relationships 

The behavioral relationships are all based upon the data attribute 
value(s) and which values are permitted to coexist in and across tuples and 
relations. 

4.1.1. Functional Dependency 

In a relation a data attribute is functionally dependent when its data 
attribute value is always the same in any tuple for a given value in a 
second data attribute. In other words, the value of the first data attribute 
is determined by the value of the second (called the determinant). 
Functional dependency expresses the informational integrity of relations. 

4.1.1.1. Entity Integrity 

Entity integrity defines the two-fold quality of tuple uniqueness in a 
relation: a) every tuple in a relation is distinct in some data attribute 


Relational 


functional relation Concepts 
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value(s) from every other tuple in that relation or symmetrically, b) there is a designated subset of data attributes ( columnfs)) 
called the primary key such that the data attribute value(s) in that relation is distinct for all tuples and no values may be null 
(a value which is unknown and incomparable to any other value). There may be more than one subset of data attributes with 
the value characteristics of the primary key (each called a candidate key) but only one is designated as the primary key. 

4.1.2. Association 

An association is a relationship between tuples in the same or different relations. Tuples 
are intrinsically separable by way of entity integrity. At the same time, humans are compelled 
to categorize their experience of things in the physical world by superimposing groupings 
that collect tuples into sets. Tuples become members in a group based upon data attribute 
value(s). This property is called membership IN. This property also permits humans to 
identify a tuple that is not in a set (i.e. discrimination). (Membership IN an association is 
distinct from membership QF a relation which is intrinsic by way of instance relationship.) 

4.1.2.1. Relational Operations 

Membership IN is realized through relational operations keying on relation structure 
and values. Each relational operation produces a real or virtual relation as its result. The 
selection operation retrieves tuple( s) based upon a selection predicate testing data 
attribute value(s) to determine whether each tuple is or is not in the set. Selection 
predicates are based on any boolean comparison including constant values or values 
referenced in data attribute value(s). The projection operation copies all the data attribute 
value(s) for a particular column(s). 

Association between relations (or a relation and itself) is based upon relating 
(matching) data attribute values in tuples of one relation with those of another. The join 
operation pairs every combination of tuples from one relation with those of another 
relation and copies the data attribute values from the pairs where the pairing satisfies a 
selection predicate. This relational operation is called join because facts from two sources 
are joined in the result. 

4.1.2.2. Join Compatibility 

Join compatibility requires that the values involved in comparisons (i.e. selection 
predicates) whether constants or data attribute values derive from the same data attribute 
domain. 

4.1.2.3. Referential Integrity 

When relations are devised such that a tuple in one relation predisposes the existence 
of (owns) tuple(s) in another, the data attribute(s) of the second required to join the 
relations is called a foreign key. Referential integrity asserts that any value found in the 
data value attribute(s) of a foreign key must appear in a tuple of the first relation as the value of a candidate key or itself be 
null. 


Without syntax? 

Every language that is invented 
to express concepts carries with it 
the understanding and the biases 
of the inventor. Depending on his/ 
her purpose(s) those biases 
simplify certain fasks performed 
with the language but may 
obscure the underlying concepts. 

Program m in g 1 angua ge desi gn 
m ust deal with the feasibility of 
automated translation and 
interoperability with other 
programming languages and 
operating systems. Com prom ises 
and ass urn ptions are chosen to 
make the resulting language 
efficient effective and marketable. 

The goal of this description of 
the entity-relations hip paradigm is 
to succinctly make the concepts 
understandable - an ambitious 
task to say the least! 

-Professor WigAespacf^ 


4.1.3. Normalization 

Relational model consistency depends on the semantic concurrence of the behavioral relationships and the objectives of 
the database modeler, the intension, (rather than the accident of a relation s contents at any particular instant, its extension). 
The integrity properties defined above enable the database modeler to devise a structure and behavior of relations that avoid 
semantic discord called anomalies, the unintended loss or modification of information by relational operations. Relations 
designed to avoid certain kinds of anomalies are said to be normalized or in normal form. Normalization is the arrangement 
of data attributes and their relationships among relation structures to prevent particular anomalies. 

4.1.3.1. First Normal Form 

First Normal Form asserts that every data attribute value is atomic, indivisible in value or form and may not be operated 
upon except as a whole and single value. 

4.1.3.2. Second Normal Form 

Second Normal Form is first normal form and asserts that every data attribute value not in the primary key is fully 
functionally dependent upon the primary key. (“Fully” means applying to every data attribute of the primary key.) 

4.1.3.3. Third Normal Form 

Third Normal Form presupposes first and second normal forms and asserts that no data attribute outside the primary key 
is transitively dependent upon the primaiy key. (“Transitively” means an attribute ^) functionally dependent upon an attribute 
functionally dependent upon an attribute (. ..) functionally dependent on the primary key.) 
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